The STL provides a collection of templates representing containers, iterators, function objects, and algorithms. A container is a unit, like an array, that can hold several values. STL containers are homogeneous; that is, they hold values all of the same kind. Algorithms are recipes for accomplishing particular tasks, such as sorting an array or finding a particular value in a list. Itertors are objects that let you move through a container much as pointers let you move through an arry; they are generalizations of pointers. Function objects are objects that act like functions; they can be class objects or function pointers(including function names because a function name acts as a pointer). The STL lets you construct a variety of containers, including arrays, queues, and lists, and it lets you perform a variety of operations, including searching, sorting, and randomizing.
There's too much information about the STL to present, so we'll look at some representative examples and examine the spirit of the generic programming approach. When you have a hands-on appreciation for containers, iterators and algorithms, we'll look at the underlying design philosophy and then take an overview of the whole STL.
A computing-style vector holds a set of like values that can be accessed randomly. That is, you can see, say, an index to directly access the 10th element of a vector without having to access the preceding 9 elements first. That is, you could create a vector object, assign one vector object to another, and use the [] operator to access vector elements. To make the class generic, you make it a template class. That's what the STL does, defining a vector template in the vector(formerly vector.h) header file.
To create a vector template object, you use the uaual <type> notation to indicate the type to be used. Also the vector template uses dynamic memory allocation, and you can use an initialization argument to indicate how many vector elements you want:
#include vector using namespace std; vector<int> rating(5); // a vector of 5 ints int n; cin >> n; vector<double> scores(n); // a vector of n doubles
After you create a vector object, operator overloading for [] makes it possible to use the usual array notation for accessing individual elements:
ratings[0] = 9; for(int i=0; i<n; i++) cout << scores[i] <<endl
Below is a example that uses this class in an undemanding application. This particular program creates two vector objects, one an int specilization and one a string specialization; each has five elements.
#include <iostream> #include <string> #include <vector> using namespace std; const int NUM = 5; int main() { vector<int> ratings(NUM); vector<string> titles(NUM); cout << "You will do exactly as told. You will enter\n" << NUM << " book titles and your ratings (0-10).\n"; int i; for (i = 0; i < NUM; i++) { cout << "Enter title #" << i + 1 << ": "; getline(cin,titles[i]); cout << "Enter your rating (0-10): "; cin >> ratings[i]; cin.get(); } cout << "Thank you. You entered the following:\n" << "Rating\tBook\n"; for (i = 0; i < NUM; i++) { cout << ratings[i] << "\t" << titles[i] << endl; } cin.get(); return 0; } //output: /* You will do exactly as told. You will enter 5 book titles and your ratings (0-10). Enter title #1: book1 Enter your rating (0-10): 4 Enter title #2: book2 Enter your rating (0-10): 5 Enter title #3: book3 Enter your rating (0-10): 2 Enter title #4: book4 Enter your rating (0-10): 3 Enter title #5: book5 Enter your rating (0-10): 4 Thank you. You entered the following: Rating Book 4 book1 5 book2 2 book3 3 book4 4 book5 */
All this program does is use the vector template as a convenient way to create a dynamically allocated array. The next section shows an example that uses more of the class methods.
Besides allocating storage, what else can the vector template do for you? All the STL containers provide certain basic methods, including size(), which returns the number of elements in a container, swap(), which exchanges the contents of two containers, begin(), which returns an iterator that refers to the first element in a container, and end(), which returns an iterator that represents past-the-end for the containor.
What's an iterator? It's generalization of a pointer. In fact, it can be a pointer. Or it can be an object for which pointer-like operations such as dereferencing(for example, operator*()) and incrementing(for example, operator++()) have been defined. As you'll see later, generalizing pointers to itertors allows the STL to provide a uniform interface for a variety of container classes, including ones for which simple pointers wouldn't work. Each container class defines a suitable iterator. The type name for this iterator is a class scope typedef called iterator. For example, to declare an iterator for a type double specialization of vector, you would use this:
vector<double>::iterator pd; //pd an iterator
Suppose scores is a vector<double> object:
vector<double> scores;
Then you can use the iterator pd in code like the following:
pd = scores.begin(); // have pd points to the first element *pd = 22.3; // dereference pd and assign value to first element ++pd; // make pd point to the next element
You can display the contents with this code:
for(pd = scores.begin(); pd != scores.end(); pd++) cout << *pd <<end;
All containers have the methods just discussed. The vector template class also has some methods that only some STL containers have. One handy method, called push_back(), adds an element to the end of a vector. While doing so, it attends to memory managements so that the vector size increases to accommodate added members. This means you can write code like the following:
vector<double> scores; double temp; while(cin >> temp && temp >=0) scores.push_back(temp); cout < < "You entered" < < scores.size() < < " scores.\n";
Below is the example illustrates some methods.
// vect2.cpp -- methods and iterators #include <iostream> #include <string> #include <vector> using namespace std; struct Review { string title; int rating; }; bool FillReview(Review & rr); void ShowReview(const Review & rr); int main() { vector<Review> books; Review temp; while (FillReview(temp)) books.push_back(temp); int num = books.size(); if (num > 0) { cout << "Thank you. You entered the following:\n" << "Rating\tBook\n"; for (int i = 0; i < num; i++) ShowReview(books[i]); cout << "Reprising:\n" << "Rating\tBook\n"; vector<Review>::iterator pr; for (pr = books.begin(); pr != books.end(); pr++) ShowReview(*pr); vector <Review> oldlist(books); // copy constructor used if (num > 3) { // remove 2 items books.erase(books.begin() + 1, books.begin() + 3); cout << "After erasure:\n"; for (pr = books.begin(); pr != books.end(); pr++) ShowReview(*pr); // insert 1 item books.insert(books.begin(), oldlist.begin() + 1, oldlist.begin() + 2); cout << "After insertion:\n"; for (pr = books.begin(); pr != books.end(); pr++) ShowReview(*pr); } books.swap(oldlist); cout << "Swapping oldlist with books:\n"; for (pr = books.begin(); pr != books.end(); pr++) ShowReview(*pr); } else cout << "Nothing entered, nothing gained.\n"; cin.get(); cin.get(); return 0; } bool FillReview(Review & rr) { cout << "Enter book title (quit to quit): "; getline(cin,rr.title); if (rr.title == "quit") return false; cout << "Enter book rating: "; cin >> rr.rating; if (!cin) return false; // get rid of rest of input line while (cin.get() != '\n') continue; return true; } void ShowReview(const Review & rr) { cout << rr.rating << "\t" << rr.title << endl; } /* output: Enter book title (quit to quit): book1 Enter book rating: 4 Enter book title (quit to quit): book2 Enter book rating: 5 Enter book title (quit to quit): book3 Enter book rating: 3 Enter book title (quit to quit): book4 Enter book rating: 5 Enter book title (quit to quit): book5 Enter book rating: 4 Enter book title (quit to quit): quit Thank you. You entered the following: Rating Book 4 book1 5 book2 3 book3 5 book4 4 book5 Reprising: Rating Book 4 book1 5 book2 3 book3 5 book4 4 book5 After erasure: 4 book1 5 book4 4 book5 After insertion: 5 book2 4 book1 5 book4 4 book5 Swapping oldlist with books: 4 book1 5 book2 3 book3 5 book4 4 book5 */
Understanding iterators is perhaps the key to understanding the STL. Just as templates make algorithms independent of the type of data stored, iterators make the algorithms independent of the type of container used. Thus, they are an essential component of the STL's generic approch.
To see why iterators are needed, let's look at how you might implement a find function for two different data representations and then see how you could generalize the approach. First, let's consider a function that searches an ordianry array of double for a particular value. You could write the function like this:
double * find_ar(double *ar, int n, const double & val) { for (int i=0; i<n; i++) if (ar[i]==val) return &ar[i]; return 0; }
If the function finds the value in the array, it returns the address in the array where the value is found; otherwise, it returns the null pointer. It uses subscript notation to move through the array. You could use a template to generalize to arrays of any type having an == operator. Nonetheless, this algorithm is still tried to one particular data structure-the array.
So let's look at searching another kind of data structure, the linked list. The list consists of linked Node structures:
struct Node { double item; Node * p_next; };
Suppose you have a pointer that points to the first node in the list. The p_next pointer in each node points the next node, and the p_next pointer for the last node in the list is set to 0. You could write a find-11() function this way:
Node * find_11(Node * head, const double & val) { Node * start; for (start = head; start!=0; start = start->p_next) if(star->item == val) return start; return 0; }
Again, you could use a template to generalize this to lists of any data type supporting the == operator. Nonetheless, this algorithm is still tied to one particular data structure-the linked list.
If you consider details of implementation, the two find functions use different algorithm:One uses array indexing to move through is a list of items, and the other resets start to start->p_next. But broadly, the two algroithms are the same: Compare the value with each value in the container in sequence until you find a match.
The goal of generic programming in this case would be to have a single find function that would work with arrays or linked lists or any other container type. That is, not only should the function be independent of the data type stored in the container, it should be independent of the data structure of the container itself. Templates provide a generic representation for the data type stored in a container. What's needed is a generic representation of the process of moving through the values in a container. The iterator is that generalized representation.
What properties should an iterator have in order to implement a find function? Here's a short list:
You should be able to dereference an iterator in order to access the value to which it refers. That is, if p is an iterator, *p should be defined.
You should be able to assign one iterator to another. That is, if p and q are iterators, the expression p = q should be defined.
You should be able to compare one iterator to another for equlity. That is, if p and q are iterators, the expressions p == q and p != q should be defined.
You should be able to move an iterator through all the elements of a containor. This can be satisfied by defining ++p and p++ for an iterator p.
There are more things an iterator could do, but nothing more it need to-at least, not for the purposes of a find functions. Actually, the STL defines several levels of iterators of increasing capabilities, and we'll return to that matter later. Note, by the way, that an ordianry pointer meets the requirements of an iterator. Hence, you can rewrite the find_arr() function like this:
typedef double * iterator; iterator find_ar(iterator ar, int n, const double & val) { for(int i=0; i<n; i++, arr++) if(*ar == val) return ar; return 0; }
Then you can alter the function parameter list so that it takes a pointer to the beginning of the array and a pointer to one past-the-end of the array as arguments to indicate a range. And the function can return the end pointer as a sign the value was not found. The following version of find_ar() makes these changes:
typedef double * iterator; iterator find_ar(iterator begin, iterator end, const double & val) { for(ar = begin; ar !=end; arr++) if(*ar == val) return ar; return end; }
for the find_11() function, you can define an iterator class that defines the * and ++ operators:
struct Node { double item; Node * p_next; }; class iterator { Node * pt; public: iterator(): pt(0){} iterator(Node * pn):pt(pn){} double operator*(){return pt->item;} iterator& operator++() //for ++it { pt = pt->p_next; return *this; } iterator operator++(int) //for it++ { iterator tmp = *this; pt = pt->p_next; return tmp; } //...opertor==(),opertor!=(),etc. };
The main point here is not how, in detail, to define the iterator class, but that with such a class, the second find function can be written like this:
iterator find_11(iterator begin, const double & val) { for(start=head; start != 0; ++start) if(*start == val) return start; return 0; }
This is very nearly the same as find_ar(). The point of difference is in how the two functions determine whether they've reached the end of the values being searched. The find_ar() function uses an iterator to one-past-the-end, whereas find_11() uses a null value stored in the final node. Remove that difference, and you can make the two functions identical. For example, you could require that the linked list have one additional element after the last officail element. That is, you could have both the array and the linked list have a past-the-end element, and you could end the search when the iterator reaches the past-the-end position. Then find_ar() and find_11() would have the same way of detecting the end of data and become identical algorithms. Note that requiring a past-the-end element moves from making requirements on iterators to making requirements on the container class.
The STL follows the approach just outlined. First, each container class(vector,list,deque,and so on) defines an iterator type appropriate to the class. For one class, the iterator might be a pointer; for another, it might be an object. Whatever the implementation, the iterator will provide the needed operations, such as * and ++. (Some classes may need more operations than others.) Next, each container class will have a past-the-end marker, which is the value assigned to an iterator when it has been incremented one past the last value in the container. Each container class will have begin() and end() methods that return iterators to the first element in a container and to the past-the-end position. And each container class will have the ++ operation take an iterator from the first element to past-the-end, visiting every container element en route.
To use a container class, you don't need to know how its iterators are implemented nor how past-the-end is implemented. It's enough to know that it does have iterators, that begin() returns an iterator to the first element, and that end() returns an iterator to past-the-end. For example, suppose you want to print the values in a vector<double> oject. In that case, you cna use this:
vector<double>::iterator pr; for(pr=scores.begin(); pr != scores.end(); pr++) cout << *pr <<end;
Here the following line identifies pr as the iterator type defined for the
vector<double> class; vector<double>::iterator pr;
If you used the list<double> class template instead to scores, you could use this code:
list<double>::iterator pr; for(pr=scores.begin(); pr != scores.end(); pr++) cout << *pr <<end;
The only change is in the type declared for pr. Thus,by having each class define appropriate iterators and designing the classes in a uniform fashion, the STL lets you write the same code for containers that have quite dissimilar internal representations.
With C++ automatic type deduction, you can simplify further and use the following code with either the vector or the list:
for(auto pr=scores.begin(); pr != scores.end(); pr++) cout << *pr <<endl;
Actually, as a matter of style, it's better to avoid using the iterators directly; instead, if possible, you should use an STL function, such as for_each(), that takes care of the details for you. Alternatively, use the C++11 range-based for loop:
for(auto x:scores) cout <<x << endl;
So to summarize the STL approach, you start with an algorithm for processing a container. You express it in as general terms as possible, making it independent of data type and container type. To make the general algorithm work with specific cases, you define iterators that meet the needs of the algorithm and place requirements on the container design. That is, basic iterator properties and container properties stem from requirements placed on the algorithm.
Different algorithms have different requirements for iterators. for example, a find algorithm needs the ++ operator to be defined so the iterator can step through the entire container. It needs read access to data but not write access.(It just looks at data and doesn't change it.) The usual sorting algorithm, on the other hand, requires random access so that it can swap two non-adjacent elements. If iter is an iterator, you can get random access by defining the + operator so that you can use expressions such as iter + 10. Also a sort algorithm needs to be able to both read and write data.
The STL defines five kinds of iterators and describes its algorithms in terms of which kinds of iterators it needs. The five kinds are the input iterator, output iterator, forward iterator, bidirectional iterator, and random access iterator. For example, the find() prototype looks like this:
template<class InputIterator, class T> InputIterator find(InputIterator first, InputIterator last, const T& value);
This tells you that this algorithm requires an input iterator. Similarly, the following prototype tells you that the sort algorithm requires a random access iterator:
template<class RandomAcessIterator, class T> RandomAcessIterator find(RandomAcessIterator first, RandomAcessIterator last, const T& value);
All five kinds of iterators can be dereferenced(that is, the * operator is defined for them) and can be compared for equality(using the == opertor, possibly overloaded) and inequality(using the != operator, possibly overloaded). If two iterators test as equal, then dereferencing one should produce the same value as dereferencing the second. That is, if
iter1 == iter2
is true, then the following is also true:
*iter1 == *iter2
Of course, these properties hold true for built-in operators and pointers, so these requirements are guides for what you must do when overloading these operators for an iterator class. Now let's look at other iterator properties.
The term input is used from the viewpoint of a program. That is, information going from the containor to the program is considered input, just as information from a keyboard to the program is considered input. So an input iterator is one that a program can use to read values from a container. In particular, dereferencing an input iterator must allow a program to read a value from a container, but it needn't allow a program to alter value. So algorithms that require an input iterator are algorithms that don't change values held in a container.
An input iterator has to allow you to access all the values in a container. It does so by supporting the ++ operator, both in prefix and suffix form. If you set an input operator to the first element in a container and increment it until it reaches past-the-end, it will point to every container item once en route. Incidentally there is no guarantee that traversing a container a second time with an input iterator will move through the values in the same order. Also after an input iterator has been incremented, there is no guarantee that its prior value can still be dereferenced. Any algorithm based on an input iterator, then, should be a single-pass algorithm that doesn't rely on iterator values from a previous pass or on earlier itertor values from the same pass.
Note that an input iteraotr is a one-way iterator; it can increment, but it can't back up.
In STL usage, the term output indicates that the iterator is used for transferring information from a program to a container.(Thus the output for the program is input for the container.) An output iterator is similar to an input iterator, except that dereferencing is graranteed to allow a program to alter a container value but not to read it. If the ability to write without reading seems strange, keep in mind that this property also applies to output sent to your display; cout can modify the stream of characters sent to the display, but it can't read what's onscreen. The STL is general enough that its containers can represent output devices, so you can run into the same situation with containers. Also if an algorithm modifies the contents of a container(for example, by generating new values to be stored) without reading the contents, there's no reason to require that it use an iterator that can read the contents.
In short, you can use an input iterator for single-pass, read-only algortithms and an output operator for single-pass, write-only algorithms.
Like input and output iterators, forward iterators use only the ++ operators for navigating through a container. So a forward iterator can only go forward through a container one element at a time. However, unlike input and output iterators, it necessarily goes through a sequence of values in the same order each time you use it. Also after you increment a forward iterator. you can still dereference the prior iterator value, if you've saved it, and get the same value. These properties make multiple-pass algorithms possible.
A forward iterator can allow you to both read and modify data, or it can allow you just to read it:
int * pirw; // read-write iterator const int * pir; // read-only iterator
Suppose you have an algorithm that needs to be able to traverse a containor in both directions. For example, a reverse function could swap the first and last elements, increment the pointer to the first element, decrement the pointer to a second element, and repeat the process. A bidirectional iterator has all the features of a forward iterator and adds support for the two decrement operators(prefix and postfix).
Some algorthms, such as standard sort and binary search, require the ability to jump directly to an arbitrary element of a container. This is termed random access, and it requires a random access iterator. This type of iterator has all the features of a bidirectional iterator, plus it adds operations(such as pointer addition) that support random access and relational operators for ordering the elements.Below is the table that lists the operations a random access iterator has beyond those of a bidirectional iterator. In this table, X represent a random iterator type, T represents te type pointed to, a and b are iterator values, n is an integer, and r is a random iterator variable or reference.
Expression | Description |
a+n | Points to the nth element after the one a points to |
n+a | Same as a+n |
a-n | Points to the nth element before the one a points to |
r+=n | Equivalent to r = r+n |
r-=n | Equivalent to r = r-n |
a[n] | Equivalent to *(a+n) |
b-a | The value of n such that b == a+n |
a<b | Ture if b-a>0 |
a>b | Ture if b<a |
a>=b | Ture if !(b<a) |
a<=b | Ture if !(b>a) |
Expressions such as a + n are valid only if both a and a+n lie within the range of the container(including past-the-end).
You have probably noticed that the iterator kinds form a hierarchy. A forward iterator has all the capabilities of an input iterator and of an output iterator, plus its own capabilities. A bidirectional iterator has all the capabilities of a forward iterator, plus its own capabilities. And a random access iterator has all the capabilities of a forward iterator, plus its own capabilities. Below is the table that summarizes the main iterator capabilities. In it, i is an iterator, and n is an integer.
Iterator Capability | Input | Output | Forward | Bidirectional | Random Access |
Dereferencing read | Yes | No | Yes | Yes | Yes |
Dereferencing write | No | Yes | Yes | Yes | Yes |
Fixed and repeatable order | No | No | Yes | Yes | Yes |
++i、i++ | Yes | Yes | Yes | Yes | Yes |
--i、i-- | No | No | No | Yes | Yes |
i[n] | No | No | No | No | Yes |
i + n | No | No | No | No | Yes |
i - n | No | No | No | No | Yes |
i += n | No | No | No | No | Yes |
i -+ n | No | No | No | No | Yes |
An algorithm written in terms of a particular kind of iterator can use the kind of iterator or any other iterator that has the required capabilities. So a container with, say, a random access iterator can use an algorithm written for an input iterator.
Why all these different kinds of iterator? The idea is to write an algorithm using the iterator with the fewest requirements possible, allowing it to be used with the largest range of containers. Thus, the find() function, by using a lowly input iterator, can be used with any container that contains readable values. The sort() function, however, by requiring a random access iterator, can be used just with containers that upport that kind of iterator.
Note that the various iterator kinds are not defined types; rather, they are conceptual characterizations. As mentioned earlier, each container class defines a class scope typedef name called iterator. So the vector<int> class has iterators of type vector<int>::iterator. But the documentation for this class would tell you that vector iterators are random access iterators. That, in turn, allows you to use algorithms based on any iterator type because a random access iterator has all the iterator capabilities. Similarly, a list<int> class has iterators of type list<int>::iterator. The STL implements a doubly linked list, so it uses a bidirectional iterator. Thus, it can't use algorithms based on random access iterators, but it can use algorithms based on less demanding iterators.